Estimation of marginal structural models under irregular visits and unmeasured confounder: calibrated inverse probability weights

Clinical information collected in electronic health records (EHRs) is becoming an essential source to emulate randomized experiments. Since patients do not interact with the healthcare system at random, the longitudinal information in large observational databases must account for irregular visits. Moreover, we need to also account for subject-specific unmeasured confounders which may act as a common cause for treatment assignment mechanism (e.g. glucose-lowering medications) while also influencing the outcome (e.g. Hemoglobin A1c). We used the calibration of longitudinal weights to improve the finite sample properties and to account for subject-specific unmeasured confounders. A Monte Carlo simulation study is conducted to evaluate the performance of calibrated inverse probability estimators using time-dependent treatment assignment and irregular visits with subject-specific unmeasured confounders. The simulation study showed that the longitudinal weights with calibrated restrictions improved the finite sample bias when compared to the stabilized weights. The application of the calibrated weights is demonstrated using the exposure of glucose lowering medications and the longitudinal outcome of Hemoglobin A1c. Our results support the effectiveness of glucose lowering medications in reducing Hemoglobin A1c among type II diabetes patients with elevated glycemic index (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 8.5 \%$$\end{document}≥8.5%) using stabilized and calibrated weights. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01831-2.


Introduction
Prospective randomized experiments are conducted to evaluate the effectiveness of an intervention. However, in the absence of controlled experiments, large data repositories may provide an alternative source to emulate the randomized experiment with the intention to draw causal inference pertaining to the effectiveness of an intervention [1]. Longitudinal data can be collected at fixed (or prespecified) time points or irregular time points during the study follow-up. For example, a randomized controlled study may pre-specify the fixed follow-up visits in advance at regular time points, while the data collection at unequally spaced visit times in an observational data repository may be related to the outcome. Since patients do not interact with the health care system at random, the longitudinal information collected in electronic health records (EHRs) tend to exhibit bias in the form of informative visit regimen [2]. For example, the data collected in EHRs may correspond to an over-representation of patient population with severe or worse disease symptoms when patients are only visiting the clinic with the acute onset of new medical ailment or the management of pre-existing chronic condition. The outcomes and the visits in discrete time-intervals may be correlated in which case the application of standard regression model (e.g. generalized estimating equations) generates biased estimation [3,4]. To correct for this, Pullenayegum and Lim [5] outlined two methods to handle longitudinal data with irregular visits: (i) inverse-intensity weighting and (ii) model-based approach using correlated random effects between the visits and the outcome. Inverse-intensity weighting method uses the measured covariates to capture the correlation between the visit and outcome processes. Similarly, the semi-parametric method assumes the dependency between the visits and outcome through baseline covariates and time-varying latent factors.
Pullenayegum and Lim [5] describe four categories of visit process: (i) fixed visits, (ii) history-dependent protocol visit, (iii) physician-driven visit, (iv) patientdriven visit. We limit the focus of this article to "historydependent protocol visit" in which the protocol specify when the visits should be, but the intervals between visits are allowed to depend on patients' previously observed history [5]. As an example, we may specify the covariate-dependent irregular visits are more frequent in the presence of confounding factors [5]. In our context, history-dependent protocol visit is a reasonable assumption for the management of patients with diabetes, since the patients are expected to visit the primary care clinic routinely depending on their health status, and as recommended by Diabetes Canada guidelines [6]. We describe the irregular visits as the presence of missed visits that do not take place within an interval (i.e. missed visits). We use non-overlapping intervals to discretise the continuous-time processes, and we conceptualize treatment, confounder and outcome as random variables that evolve with study follow-up. We assume that the historydependent protocol visit characterize the missing at random mechanism in which the missing outcome is not dependent on current and future outcome conditional on the past outcome and covariates [5]. We may quantify the extent of visit irregularity by examining the deviation from perfect repeat-measures (i.e. one visit per time interval) using the proportion of missed visits within each time intervals [7].
In the context of longitudinal causal inference, Hernán et al. [8] considered subject-specific visit times using a static observation plan with pre-specified regular times and dynamic observation plan with irregular visit occurrence due to clinical evolution of patients. Hernán et al. [8] demonstrated the need to account for selection bias and confounding arising in estimation of the effects of a dynamic treatment regime. The dependence between the outcome and the visit can arise under various combinations including: (i) conditional independence given past outcome-model covariates, (ii) conditional independence given past observation-time model covariates, (iii) conditional independence given shared latent variables [9]. In this article, the dependence between the treatment and the outcome is induced by an unmeasured (subject-specific) confounder, and thereby treatment is assumed to influence the outcome after conditioning on the observed history [10].
Regression models may yield biased estimation of longitudinal causal effect of time-varying treatment in the presence of time-varying confounder and treatment-confounder feedback [11]. In particular, selection bias is introduced when standard methods (e.g. generalized estimating equations) are used to adjust for the effect of time-dependent confounding. The existence of treatment-confounder feedback (i.e. past treatment affects the current confounder which affects the current treatment) requires the use of causal inference methods to estimate the causal effect of time-dependent treatment [12]. In addition to treatment-confounder feedback, the longitudinal data collected in EHRs may also contain additional feedback with respect to visitconfounder where the past visit affects the current confounder which affects the future visit. Marginal structural models using inverse-probability weights with respect to treatment and visit account for such temporal feedback, and provide consistent estimates of marginal causal effects with correct model specification and without unmeasured confounding factors.

Knowledge gap and motivating example
To our knowledge, the simultaneous existence of treatment-confounder and visit-confounder feedback with unmeasured confounder for each subject in longitudinal settings has not been considered in the causal inference literature. The cumulative-time weights with respect to treatment and visit account for this feedback. Moreover, the treatment weights and visit weights are calibrated to make the non-probability (i.e. observational) sample similar to the target population. In this article, we extend the earlier work by Yiu and Su [13]) in two-folds: (i) extending calibration of longitudinal weights to irregular visits, (ii) incorporating unmeasured confounder for each subject in longitudinal design. The application of this method is demonstrated to assess the effectiveness of glucose lowering medications among diabetes patients with elevated glycemic index (Hemoglobin A1c ≥ 8.5%).

Notation
A framework for longitudinal causal inference is considered for n individuals (i = 1, ..., n) in m i time-intervals (j = 1, ..., m i ) . An equally spaced time interval is denoted as The vector of baseline covariates for individual i is denoted as X i . The latent confounder η i is defined as a common cause of treatment A ij and outcome Y ij for individual i. In some instances, the index for individual i is suppressed because it is assumed that the random vector for each individual i is drawn independently with respect to other individuals. The cumulative treatment status Ā ij , cumulative confounder status L ij , cumulative visit status V ij denotes the complete history of each factor up to and including visit j. As an example, and Lā ,v j as potential outcome and potential confounder (respectively) under treatment regime ā and visit regime v with respect to the j th visit.

Marginal structural model
The marginal causal effects of treatment are specified through a parametric marginal structural model with respect to the longitudinal outcome as where Yā ,v j is the potential outcome indexed with respect to hypothetical treatment ā and hypothetical visit v . The link function is represented using an identity function g −1 (·) . We assume a rank-preserving model in which the net change in potential outcome (i.e. E(Yā ,v j |H * j−1 ) ) is preserved with respect to the treatment and visit for all subjects (i.e. absence of effect modification) [14]. In the context of EHRs, the causal contrast correspond to net change in HbA1c with respect to glucose-lowering medications. We are interested in accounting for three sources biases that distort the causal estimator: (i) measured confounding arising due to time-dependent covariates, (ii) selection bias arising due to irregular visits (i.e. missing at random), and (iii) subject-specific unmeasured confounder.
We use the generalized estimating equations with inverse probability weights to obtain an estimate of the marginal treatment effect with respect to time-dependent covariates. We describe the marginal structural model using the score function of weighted generalized estimating equations as where SW i are the inverse probability weights with stabilizing factor, ν i is the working covariance matrix of Y i and it is specified through working correlation matrix R(α) , and µ i (β) is the mean vector. The correct specification of the correlation matrix R(α) improves the efficiency of estimation while misspecification may still lead to consistent estimators. The longitudinal weights with stabilization factor are constructed to account for treatment-confounder and visit-confounder feedback as where the product terms are defined over cumulative discrete-time intervals. The observed partial history with the exclusion of time-dependent covariates for treatment and visits is denoted as H * j−1 and H * * j−1 . Since the timevarying treatment A j and time-varying visit V j are statistically endogenous, the stabilized inverse probability treatment weights SWĀ t , stabilized inverse probability visit weights SWV t and joint inverse probability weights SWĀ ,V t are required to estimate the marginal causal effect of glucose-lowering medications on hemoglobin A1c. The stabilized treatment weights are used to create a pseudopopulation in which the imbalance for time-dependent covariates across treatment groups is reduced. The stabilized visit weights are used to create a pseudo-population in which the selection bias due to irregular visits (missing at random) is reduced. The pseudo-population generated using the joint weights SWĀ ,V t incorporate both sources of biases (i.e. confounding bias and selection bias).

Assumptions
Longitudinal causal inference with time-varying treatment use the sequential version of identifiability assumptions: (i) latent sequential exchangeability, (ii) sequential postivity and (iii) sequential consistency. The sequential exchangeability assumption is an extension of conditional exchangeability (or no unmeasured confounding) assumption where the probability of treatment assignment and visit occurrence at j th visit depends on past treatment, past visit and covariate history, and conditional on these factors the potential outcome and potential confounder is independent of the treatment assignment. The latent sequential exchangeability assumption (or equivalently latent ignorability assumption) can be described as The sequential positivity assumption is defined as a nonzero probability of treatment assignment and observed visit at each time interval j given the history H j−1 . We describe sequential positivity as P(A j |H j−1 ) > 0 and P(V j |H j−1 ) > 0 for ∀ā,v . The sequential consistency assumption links the potential outcome Yā ,v j and confounder Lā ,v j to the observed outcome and confounder as If causal identifiability assumptions are satisfied then an observational study can be used to mimic a randomized experiment with the limitation that the conditional probability of treatment assignment is unknown and need to be estimated using the data in an observational study. It is further assumed that the administrative censoring C ij is non-informative where censoring is independent of observation times T ij and longitudinal outcome Y ij conditioned on history H j−1 as C ij ⊥ ⊥{Y ij , T ij }|{H j−1 , η i }.

Calibration of inverse probability weights
We motivate the application of calibrated longitudinal weights to estimate the causal effects of glucose lowering medications on hemoglobin A1c. Calibration has been used in causal inference framework to estimate the average treatment effect when regularizing high-dimensional covariates with lasso penalty [15], to minimize the variance of calibrated weights [16], to improve the finite sample properties of maximum likelihood estimation [17], to account for unmeasured cluster-level confounders [18]. Under finite longitudinal sample, stabilized inverse probability weights may not remove the associations between time-dependent treatment A ij and time-dependent covariate L ij conditional on past treatment regimen Ā ij−1 . There may still exist chance imbalances and residual confounding of covariates in the pseudo-population (weighted using SW A ij or SW V ij ) and this may contribute to finite sample estimation errors [19]. The sample estimation errors may further become exacerbated when the treatment model or the visit model are misspecified. In this article, the longitudinal inverse probability treatment and visit weights are calibrated to account for (i) associations between treatment regimen and time-dependent covariates, (ii) associations between irregular visits and time-dependent covariates, (iii) unmeasured subject-specific (i.e. timeinvariant) confounder. The purpose of calibrating the longitudinal weights is to improve the small sample properties of the longitudinal weights [17], while accounting for the unmeasured subject-specific confounder and irregular visits.
The calibration restrictions are employed on inverse probability weights to make the treatment assignment unassociated with the history of the time-varying covariates at each time interval in the pseudo-population. Yiu and Su [17] proposed the calibrated inverse probability weights to improve estimation errors under finite samples using maximum likelihood estimation. Similar to Yiu and Su [17], we calibrate the treatment and visit inverse probability weights using the maximization of weighted log-likelihood functions: after weighting the sample as where c(L ij , ) is the calibration function and reduces to one when the vector of coefficients = 0 (i.e. c(L ij , = 0) = 1) . We assume a non-negative parametric form of c(L ij , ) = exp(K ) , where the matrix K ∈ R N ×r includes the data-dependent restrictions with N = n i=1 m i observations and r columns. The unknown vector of ∈ R r is estimated using the calibration of inverse probability treatment and visit weights.
In similar spirit to Yiu and Su [20], the regression coefficient α w of treatment model are partitioned into {α b , α d } , where α b characterize the baseline distribution (e.g. intercept term) and α d characterize the dependence of treatment assignment and covariates. Likewise, the regression coefficients of irregular visits ω w are partitioned into {ω b , ω d } with the same convention. We maximize the weighted log-likelihood function with respect to through a score equation where the innermost index k accumulates over j time intervals. The dependence of treatment A ij with the time-dependent covariates L ij is represented as α d and it is constrained to be zero while the vector of treatment coefficients (excluding time-dependent covariates) is represented as α b and it is set to the maximum likelihood estimates α . In addition to calibrated treatment weights, the visit weights are calibrated to reduce the association between irregular visits and time-dependent covariates through a score equation where the vector of regression coefficients ω d = 0 denote the independence of time-dependent covariate L i,j−1 and irregular visits V ij (constrained to be zero) while ω b =ω is set to maximum likelihood estimates ω.
In both score Eqs. (3) and (4), we notice that the calibrated weights (i.e. SW A ij ( ) and SW V ij ( ) ) are used to weight the likelihood of treatment model and visit model (respectively) for the i th patient up to and including the time interval j. The calibration restrictions are inverted to estimate the values of unknown coefficients . The calibration restrictions using {α d = 0} and {ω d = 0} ensures that the treatment assignment and irregular visits are statistically exogenous with respect to the time-dependent covariates. Since the covariate balancing restrictions reduce the dependence for treatment assignment and irregular visits with respect to the functional covariate history L ij , we may represent the model-based restrictions (derived in Appendix Section) as and where the model-based propensity scores for treatment model ê A ik and visit model ê V ik are estimated as . The modelbased covariate balancing restrictions are accumulated with respect to longitudinal observations. The residuals for treatment (i.e. A ij −ê A ij ) and irregular visit (i.e. V ij −ê V ij ) are set to be orthogonal with respect to the functional history of time-dependent covariates L ij (including intercept term) in the pseudo-population defined using the calibrated weights.

Unity mean restrictions
The stabilized inverse probability weights used in the pseudo-likelihood function of marginal structural models tend to satisfy the property of unity mean at each time interval (i.e. E(SW A j ) = E(SW V j ) = 1 ∀j ) [21]. However, this property is not guaranteed to hold for calibrated inverse probability weights. Thus, in addition to the covariate balancing score constraints, we further impose the restrictions for average calibrated weights to be one at each time interval as The average treatment weights and visit weights are constrained to be equal to one at each time interval to stabilize the longitudinal weights and to prevent trivial solutions of zero (or negative weights) during the calibration procedure.

Time-invariant latent restrictions
In the presence of subject-level unmeasured confounder η i , the exposed and the unexposed subjects (with respect to treatment) are not conditionally exchangeable given H j−1 because non-causal association between Ā j and Y j cannot be blocked by conditioning on measured history H j−1 . We derive the balancing constraints for subject-specific unmeasured confounder in the context of repeated-measures longitudinal outcomes in Appendix Section. We obtain the empirical constraints to account for the unmeasured individual-level confounder η i using treatment A ij as and using visit V ij as The empirical constraints in Eqs. (5) and (6) are sufficient to describe the covariate balancing restrictions with respect to the time-invariant latent confounder η i . These empirical restrictions balance the time-dependent covariate distribution across treatment groups and visit indication within each time interval in the presence of subject-specific latent confounder.

Simulation study
Kang et al. [22] assessed the performance of various methods that use inverse probability weighting to estimate the causal effect from observational data with incomplete (missing) outcome. In similar spirit to Kang et al. [22] and others [16,23,24], the data generation in this simulation study is designed as an amalgamation of earlier longitudinal settings to evaluate the performance of inverse probability weighting with calibration restrictions.

Estimation of marginal treatment effect
We allow the treatment A j to depend on the evolution of an individual's time-varying covariate(s) L j−1 , and also Ā j−1 . Using the potential outcome framework defined with respect to treatment Ā j and visit V j , the non-null causal effect is estimated using the G formula as where the sum is defined with respect to all possible realizations of {ȳ j−1 ,l j−1 } and the marginal means of potential outcome E(Yā ,v j |X = x) is specified with respect to treatment Ā j =ā j and visit V j =v j . With large number of time intervals, the marginal means of potential outcomes may also be approximated using Markov and stationary assumptions [25].
Apart from observed dependencies between outcome E(Y j |H j−1 ) and treatment E(A j |H j−1 ) using observed history H j−1 , a subject-specific random effect η i generates latent dependency between the outcome E(Y j |H j−1 , η) and the time-dependent treatment E(A j |H j−1 , η) . As detailed in Appendix section, the G-computation can be extended to incorporate the subject-specific random effect η i as The G-computation is a generalization of covariate standardization in the longitudinal setting with timedependent exposure to express the marginal causal effect when the identifiability assumptions are satisfied [26].
The G-computation defines the population-average (i.e. marginal) causal contrast with respect to treatment and visit conditioned on baseline covariates and random effect. We used the G-computation to estimate the marginal quantity of interest (i.e. treatment effect) in marginal structural model.

Data generation
In the conditional data generating mechanism, the most recent values for visit V j−1 , confounder L j−1 and past treatment A j−1 are assumed to influence the longitudinal outcome Y j . The inter-dependencies in the collection of longitudinal information (with respect to discrete time-intervals) give rise to treatmentconfounder feedback and visit-confounder feedback [27,28]. The treatment-confounder feedback (i.e. · · · → A j−1 → L j → A j → · · · ) and visit-confounder feedback (i.e. · · · → V j−1 → L j → V j+1 → · · · ) are defined over discrete time-intervals (as shown in Fig. 1). The baseline covariates X T i are assumed to be measured as ) is a vector of unobserved covariates independently sampled from multivariate standard normal distribution. The outcome Y ij is generated as The data generating mechanism for time-varying covariates L ij , irregular visits V ij and treatment A ij are specified as . Bernoulli distribution is used to generate the time-varying binary factors with respect to the conditional probabilities for V ij , L ij and A ij while the outcome is assumed to be distributed using normal distribution ( Y i ∼ N (�, σ 2 Y ) ). The probabilities of the binary factors (L ij , A ij , V ij ) are described using inverse-logit link function where positive values indicate increased probability of binary factor. The latent factor η i ∼ N (0, 1) is time-invariant for each individual and acts as a common cause for time-varying treatment A ij and outcome Y ij . In the absence of visit (i.e. V ij = 0 ), we assume the continuation of previous treatment A ij and previous covariate L ij , and a change in treatment and covariate is only observed in the presence of visit. Moreover, we specify the outcome Y ij to be missing in the absence of visit. Using the conditional data generating mechanism, the joint distribution is expressed using the causal Markov factorization as The data generating mechanism in the presence of visit and unmeasured (latent) factor η i is described using a directed acyclic graphs in Fig. 1 where red edges denote associations, and black edges denote causal relationship between treatment and outcome.

Estimation procedures
The g-computation formula (in the absence and presence of unmeasured confounder) is used to obtain the true values of marginal treatment effects (see Appendix Section). Since the time-varying treatment A j and visit V j are statistically endogenous in observational studies, the stabilized inverse probability treatment weights (sIPTW), stabilized inverse probability visit weights (sIPVW) and joint stabilized inverse probability weights (sIPTW × sIPVW) with calibration restrictions (described in Section 2.4) are used to estimate the causal effect. The correct specification of the treatment model, visit model and marginal outcome model (with the exclusion of time-dependent confounders) are used in relation to the data generating mechanism. Four simulation scenarios are considered in the simulation study: • We implemented the constrained optimization of the cumulative-time product weights using the Barzilai-Borwein (BB) method [29]. After the estimation of stabilized and calibrated weights, the marginal causal effect of treatment is estimated using the pseudo-likelihood function of generalized estimating equations with AR(1) working correlation structure (geeglm function in R) with stabilized weights and calibrated weights. Based on the data generating mechanism, the correct specification of the marginal structural models for four scenarios were fitted as The absence of irregular visits in scenario 1 and 2 led to the exclusion of V j−1 in the equation above. We also applied a naïve estimator by fitting unweighted generalized estimating equations with AR(1) working correlation structure. The naïve estimator was used as a reference to determine the extent of bias introduced by measured confounding, selecting bias (due to irregular visit) and unmeasured subject-specific confounding within simulated scenarios. Table 1 describes the simulation parameters for the data generating mechanism described in Fig. 1. The performance of inverse probability weight-based estimators is assessed using relative bias, Monte Carlo error (MCE), root mean square error (rMSE) and coverage probability. Relative bias is estimated as the average difference between an estimator and the true

Simulation parameters
for scenario 1 and 2 ψ 0 + ψ 1 a j−1 + ψ 2 v j−1 + ψ 3 X 1 + ψ 4 X 2 + ψ 5 X 3 for scenario 3 and 4.  parameter value. MCE is defined as the standard deviation of an estimator while the rMSE is the square root sum of bias squared and MCE squared. The coverage probability was estimated as the proportion of times when the 95% confidence interval contained the true parameter value. The simulation study is repeated for 1,000 replications using a sample size N = 100 with ten discrete time intervals.

Results
The results of the simulation study are presented for the four scenarios in which the presence or absence of irregular visits and time-invariant latent confounder is considered. The results are summarized using the performance metrics of relative bias, MCE, rMSE, coverage probabilities and successful convergence of calibrated weights, as shown in Table 2.

Scenario 1: regular visits, no unmeasured confounder
We assessed the performance of sIPTW and calibrated IPTW in the absence of irregular visits and unmeasured confounder. An improvement in bias was observed for calibrated IPTW in relation to sIPTW. The MCE and rMSE for calibrated weights was smaller than the stabilized weights, and the coverage probabilities were close to 95% nominal rate. The correlation between stabilized weights and calibrated weights ranged from 0.986 to 0.999. Successful convergence of calibrated weights was observed for all 1,000 replicates.

Scenario 2: regular visits, unmeasured confounder
In this scenario, we assessed the performance of sIPTW and calibrated IPTW in the absence of irregular visits, and in the presence of unmeasured confounder. Nominal coverage rates close to 95% were observed for marginal structural models with sIPTW and calibrated IPTW. The calibrated weights had smaller mean bias than stabilized weights. The calibrated weights also had smaller MCE and rMSE than stabilized weights. The correlation between stabilized weights and calibrated weights ranged from 0.929 to 0.999. Successful convergence was observed for 959 out of 1,000 replicates.

Scenario 3: irregular visits, no unmeasured confounder
In this scenario, we assessed the performance of stabilized and calibrated variants of treatment and visit weights in the presence of irregular visits and absence of unmeasured confounder. The coverage probabilities were close to the expected 95% nominal rate for the four estimators: (i) sIPTW, (ii) sIPVW, (iii) joint sIPTW and sIPVW, (iv) and calibrated IPTW and IPVW. Smaller bias was observed for calibrated weights when compared to the stabilized weights. The correlation between stabilized weights and calibrated weights ranged from 0.25 to 0.99, and successful convergence was observed for all 1,000 replicates.

Scenario 4: irregular visits, unmeasured confounder
In this scenario, we assessed the performance of stabilized and calibrated variants of treatment and visit weights in the presence of irregular visits and unmeasured confounder. The coverage probabilities were close to the expected 95% nominal rate for all estimators. A reduction in bias was observed for calibrated IPTW and IPVW estimators when compared with sIPTW and sIPVW estimators. The correlation between stabilized weights and calibrated weights ranged from 0.24 to 0.99, and successful convergence was recorded for 966 out of 1,000 replicates.

Application: emulating randomized experiment using EHRs
Electronic health records (EHRs) can serve as a complete lifetime record of a patient's longitudinal health trajectory, and they can be collected from different sources including hospitals, specialist clinics, primary care providers, pharmacies, and laboratories. University of Toronto practice based research network (UTOPIAN) collects de-identified patient-level medical information from EHRs of primary care practices across the Greater Toronto region [30,31]. The EHRs in UTOPIAN database contains patient-level demographics, medical diagnosis, procedures, medications, immunization, laboratory test results, vital signs and risk factors. Even though the UTOPIAN repository is a rich source of deidentified patient-level medical information, it is prone to many sources of bias including informed presence of patients due to acute onset of new medical ailment or the management of pre-existing chronic condition. In our context, we used the irregular visits in longitudinal follow-up to describe the informed presence of diabetes patients in primary care.

Glucose lowering medications
Diabetes is one of the most common chronic conditions in which the body cannot properly use insulin produced by pancreas (Type II). Hemogloblin A1c (HbA1c) is an important glycemic marker to reduce the incidence of diabetes-related complications and mortality [32]. More than 90% of type II diabetes patients eventually require more than metformin monotherapy to achieve their target for optimal glucose control. Dual or combination therapy may become necessary when metformin is insufficient. The second-line options for glucose management include sodium-glucose co-transporter 2 inhibitors (SGLT-2i) and dipeptidyl peptidase-4 inhibitors (DPP-4i) medications. SGLT-2i medications are associated with reduction in cardiovascular outcomes and decreased progression of renal disease [6], while DPP-4i medications are known to have highly favourable tolerability among elderly patients [33]. We assess the effectiveness of SGLT-2i and DPP-4i drugs using HbA1c as the glycemic marker among patients with type II diabetes. We assume intention-to-treat design for time-varying treatment assignment where the analysis are based on the treatment assignment (i.e. drug prescription) rather than the treatment eventually received (i.e. drug dispensation).

Cohort generation
We emulate a randomized experiment using an observational setting, and describe several elements of emulating the target trial in Table 3. Patients are enrolled in the longitudinal cohort from January 01 2018 when the following conditions are satisfied: (i) patient is at least 18 years of age; (ii) patient has a clinical indication for diabetes [34]; (iii) HbA1c ≥ 8.5% is recorded within the study period. Patient follow-up starts when these eligibility criteria (i)-(iii) are met. Patients are administratively censored on September 30 2021 or censored at mid-calendar (June 30) when deceased year is recorded. The enrollment period is terminated on January 01 2020 while the study follow-up is terminated on September 30 2021. Once the patient is enrolled in the cohort, three month time-intervals are generated under the assumption that the patient regularly visits the clinic on quarterly basis (i.e. every 3 months) as recommended by Diabetes Canada guidelines [6]. The presence of billing record for any primary care services within a quarter for a given patient indicate V ij = 1 .
As an example, V ij = 0 if no billing record is available within an index quarter and V ij = 1 if patient has one or more billing records within an index quarter. The time-dependent exposures A ′ ij and A ′′ ij (prescription for SGLT-2i and DPP-4i medications), time-dependent confounders L ij (co-morbidities and glucose lowering medications, respectively), and continuous outcome Y ij (Hemoglobin A1c) are assumed to be constant within each index quarter. In the case of multiple visits within each quarter, positive values of A ′ ij , A ′′ ij and L ij take precedence while an average value of Y ij is computed for each patient within each quarter. Once the positive indication for a co-morbidity is recorded, the patient is assumed to have the co-morbidity for the remainder of the study period. Patients who had an earlier prescription for SGLT-2i or DPP-4i medications three years prior to January 01 2018 were excluded. The three-year look back window reduced the possibility of selection bias by left truncating those individuals who initiated the secondary treatment (using SGLT-2i or DPP-4i) prior to meeting the eligibility criteria.

Marginal structural model
The effectiveness of glucose lowering medications was assessed among diabetes patients who were prescribed SGLT-2i and/or DPP-4i medications during the study period. Longitudinal marginal structural models using generalized estimating equations (AR-1 correlation) with stabilized and calibrated weights were used to generate covariate balance with respect to treatment assignment (i.e. SGLT-2i and DPP-4i medications) and irregular visits. We fitted the following marginal structural model: where j denotes the cumulative number of prescriptions for glucose lowering medications, truncated at two prescriptions. The marginal structural model did not include the time-dependent covariates L ij as they were accounted for using the longitudinal weights. The pseudo-populations were defined using the stabilized weights with their calibrated counterparts defined as ; ; ; calibration restrictions included the balancing conditions for the orthogonality constraint (i.e. (A ik −ê A ik )⊥ ⊥L ik−1 and (V ik −ê A ik )⊥ ⊥L ik−1 ), and unity constraints (i.e. E(SW j ) = 1 ∀j ). We truncated the stabilizing and calibrated weight functions at 1% and 99% quantiles to improve the estimation of the marginal effects [36].

Effectiveness of glucose lowering medications
We describe the effectiveness of glucose lowering medications among diabetes patients with elevated HbA1c (i.e. ≥ 8.5% ) using the stabilized (product) weights and the calibrated weights. The pseudo-population characterized using the product weights (Fig. 3) and calibrated weights (Fig. 4) were used to describe the net change in mean HbA1c with respect to the patient demographics, geographical characteristics and treatment assignment. The stabilized (product) weights ranged from 0.003 to 20.808 with mean value of 1.381 while the calibrated weights ranged from 0.003 to 13.045 with mean value of 1.002 (see Fig. 2). The correlation between stabilized and calibrated weights was 0.985. Using the calibrated weights, we note that older patients have lower mean HbA1c than younger patients (e.g. 65 − 79 years vs. 18 − 34 years: −0.63% (95% CI: −1.02% to −0.24% , P-value < 0.001 )). The mean HbA1c is lower among male patients than female patients ( −0.16% (95% CI: −0.31% to 0.00% , P-value = 0.049)). The mean HbA1c is lower among patients residing in highest income quintiles than lowest income quintiles ( −0.26 % (95% CI: −0.49% to −0.04% , P-value = 0.02)). We note that the mean HbA1c is reduced with glucose lowering medications (SGLT-2i and DPP-4i). For example, the mean HbA1c is reduced by −0.38% (95% CI: −0.57% to −0.19% , P-value < 0.001 ) with a single prescription of SGLT-2i medication while it is reduced by −0.28% (95% CI: −0.45% to −0.10% , P-value< 0.001 ) with a single prescription of DPP-4i medication. A reduction of −0.78% (95% CI: −0.87% to −0.69% ) in mean HbA1c is observed with primary care visit, indicating improved regulation of blood glucose. Similar findings were observed for stabilized weights (when compared to calibrated weights) as the inference did not change drastically (see Figs. 3 and 4).

Discussion
The longitudinal data in EHRs feature several methodological complexities. For example, the irregular visits in EHRs are often recorded in which the longitudinal outcome is measured sporadically across time-intervals, and it may depend on patients' observed history (i.e. missing at random) [5]. Furthermore, the presence of time-invariant latent confounders in EHRs may generate biased estimation of treatment effect [37].
In this article, we proposed calibrated weights to estimate the effect of time-dependent treatment on longitudinal outcomes with irregular visits and with unmeasured confounding. We derived the calibrated restrictions for subject-specific latent confounder which acts as a common cause for treatment and outcome. In our simulation study, the effect of time-varying treatment propagated towards the longitudinal outcome in multiple pathways: (i) direct effect of treatment on outcome, (ii) indirect effect of treatment through the time-varying covariate, (iii) indirect effect of treatment through the irregular visits. The proposed weights created a pseudo-population in which the irregular visits were exogenous (using the visit weights), and in which the treatment groups were exchangeable (using the treatment weights). Marginal structural model with inverse probability weights with respect to treatment accounted for confounding bias in treatment-confounder feedback while the inverse probability weights with respect to irregular visit accounted for selection bias in visit-confounder feedback [8]. An improvement in finite sample properties (e.g. reduced bias) was observed with the application of calibrated weights. We further employed the calibrated weights to assess the effect of cumulative prescription of glucose lowering medications (SGLT-2i and DPP-4i) on Hemoglobin A1c. Similar to calibrated weights, Imai and Ratkovic [24] proposed a methodology for covariate balancing weights which improved performance by reducing the mean square error under correct and incorrect model misspecification. Covariate balancing weights are computationally intensive and may only accommodate small number of follow-up visits and covariates in longitudinal settings due to increase in the total number of moment conditions for weight estimation [13]. Even when the causal identifiability assumptions hold, biased estimation of average treatment effect is possible for longitudinal marginal structural models or g-computation when the parametric models are misspecified. In the earlier literature [20] and in this article, the calibrated restrictions on inverse probability weights improved the estimation error under finite sample when compared to maximum likelihood estimation.
Young et al. [38] discretized continuous-time and used pooled logistic regression with short intervals to estimate the causal effect of treatment on longitudinal outcome with rare occurrence. If we consider binary repeatedmeasures outcome (e.g. elevated Hemoglobin A1c) then we may specify short discrete time intervals with negligible event rate where the time-dependent treatment is influenced with respect to the observed history. Varying the length of time-intervals may also describe the extent of irregularity in follow-up visits [7]. Furthermore, in this article, attention was limited to irregular visits where the entire study population was assumed to follow a single type of homogeneous process. As alluded by Neuhaus et al. [39], a more realistic approach could be to consider a combination of regular and irregular visits in the longitudinal cohort using the outcome-visit dependency. One approach to incorporate this could be to cluster the patient profiles (with respect to irregularities in visits) using latent class analysis prior to assessing the effectiveness of glucose lowering medications [32].
In this article, we made the assumption of sequential positivity with respect to the treatment and irregular visits. Sequential positivity assumption may become implausible in some situations when the clinical profile of a patient precludes them from receiving SGLT-2i or DPP-4i medications, or precludes them from visiting the primary care clinic for some time-intervals. As an extension when the sequential postivity assumption is unrealistic, we may employ doubly robust estimators to reduce bias and imprecision due to large weights. Doubly robust estimators may improve the robustness against model misspecification when either the treatment model or the counterfactual outcome model is misspecified [40]. For example, Pullenayegum and Feldman [41] described the doubly robust estimator which is robust to the misspecification of the visit model (using the inverse intensity weights) and imputation model (using increment estimator). Future research may focus on the extension of calibrated estimators to accommodate for the misspecification of treatment or outcome models using doubly robust estimation [20]. The proposed estimators with calibrated restrictions also relied on the assumption of sequential exchangeability where all potential confounders are measured at each time-interval (i.e. no unmeasured time-variant confounders). In a situation where the sequential exchangeability assumption is violated, Coulombe et al. [42] and Streeter et al. [43] describe sensitivity analysis to assess the extent to which unmeasured confounding can affect the marginal effect of the treatment. As an extension to this article, sensitivity analysis may become necessary if we consider unmeasured timedependent covariates in longitudinal settings.

Conclusion
We demonstrated the application of calibrated weights to assess the effect of irregular visits and to assess the efficacy of glucose lowering medications (SGLT-2i and DPP-4i) on the longitudinal trajectory of Hemoglobin A1c. The empirical results showed a reduction in mean Hemoglobin A1c with cumulative prescriptions of SGLT-2i and DPP-4i medications using primary care EHRs. These findings support the earlier results from a meta-analysis in which SGLT-2i medications had stronger efficacy than DPP-4i medications [44]. In general, stronger efficacy of one treatment (e.g. SGLT-2i medications) in reducing Hemoglobin A1c is not a clinical rationale for selecting a particular treatment (e.g. DPP-4i medications). Instead, the treatment for glycemic control in type II diabetes patients is often intertwined with several clinical characteristics including weight change, blood pressure, cardiovascular profile, renal profile, and other safety profiles [45].